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Identification of a bottom quark-antiqnark pair in a single jet 
with high transverse momentnm and its application 
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In this paper we introduce a new approach to identify a bottom qnark-antiquark pair inside a 
single jet with high transverse momentum by using the jet substructure in the center-of-mass frame 
of the jet. We demonstrate that the method can be used to discriminate the boosted heavy particles 
decaying to a hb final state from QCD jets. Applications to searches for the standard model Higgs 
boson [H) decaying to bb when produced in association with a weak vector boson are also discussed. 
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I. INTRODUCTION 

The existence of a Higgs boson-like particle with a mass 
of around 125 GeV has been firmly established by both 
the ATLAS and CMS experiments [il,i in its bosonic 
decay modes {H — >• 77, H —>• ZZ, and H WW). 
However, its decay to a bottom quark-antiquark pair (bb) 
final state has not been observed yet. Finding such a de¬ 
cay signal at the LHC is challenging because of the large 
amount background from the production of multijets con¬ 
taining b quarks, despite that the H ^ bb decay mode is 
predicted in the standard model (SM) to have a branch 
fraction of 58% for rriH = 125 GeV. As a result, such 
searches have been mostly performed in the pp ^ VH 
production mode, where V is either a, W or a Z boson 
that decays leptonically, and H —>• bb. So far, no evidence 
of such a decay has wt been seen by either the ATLAS 
or GMS experiment [ 1 , . 

It has been shown that the search sensitivity of iL —?> 66 
in the VH production mode can be significantly im¬ 
proved by reconstructing the hadronically decaying Hires 
boson with large transverse momentum in a single jet [ 3 , 
especially together with the implementation of the jet 
substructure techniques • Such an approach requires 
the identification of both 6 quarks decaying from the 
Higgs boson in a single jet, hereafter referred to as a 
Higgs jet (H jet). While the identification of an isolated 
jet stemming from the hadronization of a single 6 quark 
(6-tagging) has been widely used in many experimental 
measurements, its application to the Higgs jets is not 
trivial since a Higgs jet has two 6 quarks inside In 

this paper, we extend the studies presented in Refs. [n^ 
[TBl to explore the identification of the 66 pair inside a H 
jet (double 6-tagging) in the center-of-mass frame of the 
jet. We demonstrate that the method can greatly reduce 
the QCD jet background while maintaining a high iden¬ 
tification efficiency of the boosted Higgs boson even in an 
environment with very large numbers of multiple interac¬ 
tions per event (pileup), where the QCD jets are defined 
as those jets initiated by a nontop quark or gluon. 

We organize this paper as follows: In Sec. m we de¬ 
scribe the event sample we used in the study. Sec¬ 
tion Hill discusses the method to identify a bottom quark- 
antiquark pair in a single jet in the jet center-of-mass 


frame and its performance. Applications of our method 
to the searches of 66 in the VH production mode 

are discussed in Sec. HYl We conclude in Sec. 0 


II. EVENT SAMPLE 

We use the boosted H jets, from the SM process of 
WH production, as an benchmark to illustrate our pro¬ 
posed double 6-tagging method. For simplicity, we only 
consider the background from the SM W-l-jets produc¬ 
tion to study the background rejection performance of 
the QCD jets as it is the largest background in searches 
for iJ —7> 66 in the WH production mode. However, 
our method is generic and is applicable to any boosted 
heavy particles decaying to a 66 final state. In addition, 
we also generate events to simulate the SM processes of 
ZH, WZ, WW, ZZ, Z-bjets and top quark production. 

All the events used in this analysis are produced us¬ 
ing the Pythia 8.186 Monte Carlo (MC) event genera¬ 
tor [ij, llBl for the pp collision at 14 TeV center-of-mass 
energy. To simulate the finite resolution of the Calorime¬ 
ter detector at the LHC experiments, we divide the (r\, (j)) 
plane into 0.1 x 0.1 cells. The energies of particles en¬ 
tering each cell in each event, except for the neutrinos, 
are summed over and replaced with a massless pseu¬ 
doparticle of the same energy, also referred as an en¬ 
ergy cluster, pointing to the center of the cell. These 
pseudoparticles are fed into the FastJet 3.0.1 [lB| pack¬ 
age for jet reconstruction. The jets are reconstructed 
with the anti-fer algorithm [13 with a distance parame¬ 
ter of AR — 0.8. The anti-feT jet algorithm is the default 
one used at the ATLAS and CMS experiments. As for 
the charged tracks, their momentum and vertex positions 
are smeared according to the expected resolutions of the 
ATLAS detector [l^ . To evaluate the performance of the 
double 6-tagging with the currently expected experimen¬ 
tal conditions at the LHC, we generate the MC events 
with different average numbers of multiple interactions 
per event, where the beam spot is assumed to follow a 
Gaussian distribution with a width of 0.015 mm in the 
transverse beam direction, and 45 mm in the longitudi¬ 
nal beam direction. We then perform our studies for each 
scenario and compare their performances. 
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III. DOUBLE 6-TAGGING AND JET 
SUBSTRUGTURE IN THE REST FRAME 

A. Event selection 

In this section we describe the method to identify a 
bottom quark-antiquark pair in a single jet using the sub¬ 
structure in the center-of-mass frame of the jet in order to 
distinguish a boosted hadronically decaying Higgs boson 
from QCD jets. 

The study is done using the MC simulated events of 
the WH and IU-|-jets productions, where the W boson 
decays leptonically (W —>■ We select events with 

one isolated lepton (electron or muon) with px > 20 GeV 
and Ip I < 2.4, where px and p are the transverse momen¬ 
tum and psudorapidy of the lepton. For the jet recon¬ 
struction, studies show that its energy and invariant mass 
(rujet) can significantly shift to higher values due to the 
presence of additional energy depositions from underly¬ 
ing events and pileups. We employ a jet area correc¬ 
tion technique [1^ to take into account the effects on the 
event-by-event basis. For each event, a distribution of 
transverse energy densities is calculated for all jets with 
IpI < 2 .1, and its median is taken as an estimate of the 
energy density of the pileup and underlying events. We 
subsequently correct each jet by subtracting the product 
of the transverse energy density and the jet area, which 
is determined with the “active” area calculation tech¬ 
nique M- This method results in a modified jet four- 
momentum that is used throughout the paper unless ex¬ 
plicitly stated otherwise. The jets with px > 300 GeV, 
IpI < 1.7, and 40 GeV < mjet < 240 GeV in an event are 
selected as the H jet candidates for further analysis. 

For 6-tagging, only charged tracks with px > 1 GeV 
and IpI < 2.5 are considered. They are also required to 
satisfy the criteria that |do| < 1 mm and \zo — Zpv \ sin0 < 
1.5 mm, where do and zo are the transverse and longi¬ 
tudinal impact parameters of the charged track, Zpv is 
the longitudinal position of the primary vertex, and 9 is 
the polar angle of the charged track. A charged track 
is considered to be associated with a jet if the distance 
parameter of AR = y'Ajf + A^ is less than 0.8, where 
Ap and Acj) are defined as the differences in psudorapid- 
ity and the azimuthal angle between the charged track 
and the jet, respectively. 

B. Center-of-mass frame of a jet 

We define the center-of-mass frame (rest frame) of a jet 
as the frame where the four-momentum of the jet is equal 
to p)f®* = (mjet, 0,0, 0). A jet consists of its constituent 
particles. The distribution of the constituent particles of 
a boosted Higgs jet in its center-of-mass frame looks like 
a back-to-back dijet event with one b quark in each of the 
subjets. On the other hand, a QGD jet acquires its mass 
through gluon radiation and it is not a closed system. 
The constituent particle distribution of a QGD jet in the 


rest frame is more likely to be random, as illustrated in 

Fig.m 





FIG. 1: Illustration of the constituent particle distribution of 
a jet. (a) H jet in the lab frame, (b) H jet in the jet rest 
frame, (c) QCD jet in its rest frame. 


C. Reclustering 

We recluster the energy clusters of a jet to reconstruct 
subjets in the jet rest frame using a modified e+e” Cam¬ 
bridge jet reconstruction algorithm (2^ in the Fast Jet 
3.0.1 [ig package by replacing the distance parameter 
with a new choice of the distance parameter, Q, where 
Q is defined as the angle between two pseudoparticles in 
the jet rest frame. The algorithm performs a sequential 
recombination of the pair of psedoparticles that is closest 
in angle Q, except for Q > 0.8. The reconstructed sub¬ 
jets are required to have energy Agubjet > 10 GeV in the 
H jet rest frame. We then boost all the tracks associ¬ 
ated with the H jet candidate back to the center-of-mass 
frame of the jet. A charged track is considered to be as¬ 
sociated with a subjet only if their angular separation is 
less than 0.8 in the jet rest frame. By doing so, we sepa¬ 
rate the charged tracks that originate from different par- 
tons of the Higgs boson decay and reject many charged 
tracks from underlying events and pileup. This allows a 
straightforward identification of the b quarks inside the 
H jets by applying the existing 6-tagging algorithms on 
the charged tracks associated with each subjet. In our 
analysis, we only retain the jets if their two subjets with 
the highest energy (leading subjet) have at least one as¬ 
sociated charged track. Those two subjets are considered 
as the 6 and 6 subjet candidates of the H jet. 

D. Double 6-tagging 

In this paper, we illustrate the double 6-tagging in the 
jet rest frame with a tagging algorithm based on the 
charged track impact parameters since this algorithm is 
widely used in many experiments and is easy to imple¬ 
ment. It is also among the official 6-tagging methods used 
by the ATLAS experiment [2l[. The impact parameters 
of tracks are computed with respect to the primary vertex 
in the lab frame. They typically have significant nonzero 
values for the charged tracks from the 6-hadron decays 
because of its long lifetime. The impact parameter is 
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FIG. 2: The subjet weight distributions of the transverse and longitudinal impact parameter weight under different pileup 
conditions. In (a) and (b), the solid (dashed) lines represent the distributions of the charged tracks associated with the subjets 
that have the highest energy in the H (QCD) jet rest frame. In (c) and (d), the solid (dashed) lines represent the distributions 
of the charged tracks associated with the subjets that have the second-highest energy in the H (QCD) jet rest frame. All the 
distributions are normalized to unity. 



FIG. 3: The distribution of the number and invariant mass of the charged tracks that are associated with the subjets in the 
rest frame of the H (solid line) and QCD (dashed line) jets under different pileup conditions. In (a) and (b), the distributions 
are from the subjets with the highest energy in the jet rest frame. In (c) and (d), the distribution are from the subjets with 
the second highest energy in the jet rest frame. All the distributions are normalized to unity. 


signed to further discriminate the tracks from 6-hadron 
decay from tracks originating from the primary vertex 
based on the fact that the decay position of the 6 hadron 
lies along its flight path. The sign of transverse impact 
parameter do is determined using the subjet momentum 
Psubjet, the track momentum ptrk at the point of the clos¬ 
est approach altrk [Hj to the primary vertex position afpv: 

sign (do) = (Psubjet ^ Ptrk) ' (Ptrk ^ (^pv ^trk))- (1) 

The sign of longitudinal impact parameter zo is mea¬ 
sured by the sign of (?7subjet - Ptrk) x zo.trk, where Psubjet 
is the pseudorapidity of the subjet, and ?7trk and zo.trk 
are the pseudorapidity and longitudinal impact param¬ 
eters of the track at the position Ttrkj respectively. All 
the quantities in the computation of the signed impact 
parameters are the ones defined in the lab frame. 

We form a likelihood of the charged tracks associated 
with a subjet. The measured impact parameter signifi¬ 
cance Si of the ith track in a subjet is compared to the 
predefined functions for both 6 subjet and non-6 subjet 
hypothesis, b{Si) and u(S'i), where 6(5') and u{S) are the 
smoothed and normalized distributions of the charged 
tracks that are associated with the 6 subjets in the signal 
H jets and the subjets in the QCD jets, respectively. The 


ratio of the probabilities b{Si)/u{Si) defines a weight Wi. 
A subjet weight VFsubjet is then calculated as the sum of 
the Wi from all the charged tracks associated with the 
subjet. In case there are no charged tracks associated 
with a subjet, its subjet weight is assigned to be zero. 
The distributions of the subjet weights of the H and 
QCD jets are shown in Figj^l It shows a clear separation 
between the signal and background distributions. 

To further help identify subjets that are originated 
from a 6 quark, we explore two additional properties of 
the subjet: the number and the invariant mass of the 
charged tracks that are associated with the subjet. Their 
distributions of the H and QCD jets are shown in Fig. 

The final double 6-tagging variable is constructed using 
a boosted decision tree (BDT) algorithm with the sub¬ 
jet weights of the first two leading subjets in the jet rest 
frame, the numbers and invariant masses of the charged 
tracks associated with the first two leading subjets. The 
signal efficiency of H jets by identifying bottom quark 
and antiquark inside vs the background rejection of QCD 
jets for the BDT variable is shown in Fig 21 Note that 
the performance of the double 6-tagging is slightly better 
with higher pileup at certain signal efficiencies. This is 
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FIG. 4: The background rejection of QCD jets vs. the signal efficiency of H jets in different pileup conditions based on the 
double 6-tagging without (a) and with (b) using the jet substructure information in its rest frame. 


an effect that is caused by the selection of the jets used 
in the evaluation of the double 6-tagging performance. In 
our analysis, we only use jets that have px > 300 GeV, 
40 < mjet < 250 GeV, and at least two subjets with 
^'subjet > 10 GeV and nonzero charged tracks in its rest 
frame. As a result, when the pileup increases, some QGD 
jets that otherwise would not pass the jet selection cri¬ 
teria can be selected. We repeat the studies of double 6- 
tagging for jets selected with a higher px requirement (up 
to 1 TeV) and observe no degrading of the performance. 
For a given signal efficiency, the background rejection is 
actually slightly higher for jets with higher transverse 
momentum. This is primarily due to the fact that the 
displaced decaying vertices of 6 hadrons in higher px jets 
are further away from the beam spot as they have larger 
Lorentz boosts. 


E. Jet substructure 

The jet substructure information can be used to im¬ 
prove the identification of the boosted i/ —^ 66 in ad¬ 
dition of the double 6-tagging. Here we demonstrate it 
by combining the double 6-tagging with the jet substruc¬ 
ture variables defined in the jet rest frame, introduced in 
Ref. 0. They are the thrust, thrust-minor, sphericity, 
aplanarity, and Fox-Wolfram Moments i? 2 - Those vari¬ 
ables are designed to identify a boosted two-body decay 
heavy particle of which the final decaying products are 
reconstructed in a single jet. They have been success¬ 
fully implemented by the ATLAS experiment to make 
the first observation of the boosted hadronically decay¬ 
ing vector boson reconstructed as a single jet from the 


SM W/Z-l-jets production [ 23 . In addition, we intro¬ 
duce another variable cos0, where 0 is defined as the 
angle between the direction of the thrust axis of a jet in 
its rest frame m and the jet momentum direction. 

We form a BDT variable using the jet substructure 
variables described above with the variable used in the 
double 6-tagging in Sec. IIII PI Studies show that the jet 
substructure variables calculated based on the energy 
clusters have a great dependence on the pileup condi¬ 
tion because of the additional energy depositions from 
the pileup and underlying events. To minimize the ef¬ 
fect, the jet substructure variables in this paper are all 
computed using the charged tracks associated with the 
jet. Their distributions for the H jets and the QGD jets 
under different pileup conditions can be found in Ap¬ 
pendix]^ As shown in Fig. |4l the background rejection 
achieved by combining the double 6-tagging and the jet 
substructure variables in the jet rest frame is a factor of 
2 to 3 better compared to the one that only relies on the 
double 6-tagging. 

IV. APPLICATION 

In this section, we study two examples of the appli¬ 
cation of the double 6-tagging algorithm in searches for 
iL —^ 66 in the VH production modes, where the W/Z 
boson decays leptonically. For simplicity, we only con¬ 
sider the kinematic region of the VH production where 
the Higgs boson has a relatively high transverse momen¬ 
tum so that its hadronically decaying products can be 
reconstructed in a single jet. In both examples, we as¬ 
sume that the average pileup at the LHC is 50. 
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A. Search for H ^ bb in the WH production mode 

In this search channel, the leptonically decaying W 
boson is reconstructed by requiring exactly one isolated 
lepton with px > 20 GeV, I 77 I < 2.5 and more than 
25 GeV of missing transverse energy in an event. 

We then select jets with px > 300 GeV, \ri\ < 1.7 and 
40 < mjet < 240 GeV in the event as the hadonically de¬ 
caying Higgs boson candidates. The jet is reconstructed 
with the anti-Zcx algorithm with a distance parameter 
AR = 0.8. To reduce the large amount of background 
from the SM W -|-jets production, a selection on the BDT 
variable based on the double 6 -tagging and the jet sub¬ 
structure information as described in Sec. IIII El is applied. 
We optimize the selection cut on the BDT variable by 
maximizing S/\/~B, where S and B are the numbers of 
the signal and background events within 20 GeV of the 
Higgs boson mass. In addition, we reject an event if it 
has a 6 jet that is not overlapping with the selected H jet 
candidate. This selection significantly reduces the back¬ 
ground from the SM tt production. The jet mass distri¬ 
bution of the jet candidates after all the event selection 
is applied is shown in Fig.O The significance of Sj'/B in 
the signal window is about 4 assuming 400 fb“^ of LHG 
data at the 14 TeV center-of-mass energy. 


B. Search for 77 66 in the ZH production mode 

In the search channel of the ZH production mode, the 
boosted 77 —7> 66 is reconstructed in a single jet that is 
based on the anti-fcx algorithm with a distance parameter 
AR = 0.8. We require the jets to have px > 300 GeV, 
|? 7 | < 1.7 and 40 < mjet < 200 GeV. The Z boson is 
reconstructed in the final states of Z —>■ ££, (£ = e,p) 
and Z ^ vD. Candidates of Z ^ ££ decays are selected 
by combining an isolated, oppositely charged pair of elec¬ 
tron or muon tracks and requiring the dilepton invariant 
mass to be within 20 GeV of the Z boson mass. The lep¬ 
tons are also required to havepx > 20 GeV and \ri\ < 2.5. 
The identification of Z ^ vD decays is done by requir¬ 
ing > 300 GeV and A(/i(7;f‘‘"",jet) > 3, where 

A(/)(75'™'®®, jet) > 3 is the azimuthal angle between the di¬ 
rections of the and the momentum of the selected 

77 jet candidate. The events with a 6 jet that is not over¬ 
lapping with the selected 77 jet candidate are rejected. 
After applying the above selection criteria, the dominant 
background left is the events from the SM Z-|-jets pro¬ 
duction, where Z —>■ ££, vv and the recoiled jet is misiden- 
tified as a 77 jet candidate. This background is greatly 
reduced by using the BDT variable based on the double 
6-tagging and the jet substructure variables as described 
in Sec. IIII El We optimize the selection cut on the BDT 
variable by maximizing the signal significance of Sj^/B. 

The signal yield is extracted by a binned likelihood fit 
to the TOjet distribution of the selected 77 jet candidates, 
as shown in Fig. [51 The probability density functions 
(PDF) of the 77 —>■ 66 and Z —>■ 66 are modeled as two 



FIG. 5: Jet mass distribution of the selected 77 jet candi¬ 
dates in the MC simulated event sample that is equivalent to 
400 of LHC data at the 14 TeV center-of-mass energy 
after all the event selection criteria are applied. The open 
histogram represents the expected contribution of the WH 
signal events. The left hashed histogram represents the ex¬ 
pected contribution of the WZ production, where Z ^ bb 
is also reconstructed as a single jet. The right hashed his¬ 
togram shows the expected background that is dominated by 
the top productions (> 80 %) with also a significant contri¬ 
bution from the IT-|-jets production. The peaking structure 
around 160 GeV is from the hadronically decaying top quark 
from the SM top production. 


Gaussian functions. The combinatorial background PDF 
is parametrized by a bifurcated Gaussian function that 
has different widths on the left and right sides of the 
mean. The existence of the Z —^ 66 signal peak from 
the SM ZZ production in this search channel provides 
an excellent calibration sample to constrain the 77 —>■ 66 
PDF parameters. In actual data analysis at the LHG 
experiments, the parameters of the Z —^ 66 PDF can be 
precisely determined from data by studying the boosted 
hadronically decaying Z boson from the Z-|-jets produc¬ 
tion [ 2 ^. The background PDF can be also constrained 
using the events from the multijet production. In the de¬ 
fault fit, the means of the Gaussian functions are allowed 
to float with a constant difference that is fixed to the 
MC predicted mass difference between the Z and Higgs 
bosons. The widths of two Gaussian functions are set to 
the values predicted by MC simulation. The mean of the 
bifurcated Gaussian function is allowed to be free in the 
fit, while the widths are fixed to the MC predicted val¬ 
ues. The fit result for the MC simulated events sample 
that is equivalent to 400 fb“^ of LHC data at the 14 TeV 
center-of-mass energy is shown in Fig. [5] The fit yields 
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FIG. 6: Jet mass distribution of the selected H jet candi¬ 
dates in the MC simulated event sample that is equivalent 
to 400 fb“^ of LHC data at the 14 TeV center-of-mass en¬ 
ergy after all the event selection criteria are applied. The 
open histogram represents the expected contribution of the 
ZH signal events. The left hashed histogram represents the 
expected contribution of the ZZ production, where Z ^ bb is 
also reconstructed in a single jet. The right hashed histogram 
shows the expected combinatorial background that is domi¬ 
nated by the Z-|-jets production. The solid black curve shows 
the final fit to the MC data. The dashed lines show each of the 
PDF components: the signal (blue), ZZ production (green), 
and the combinatorial background (red). 


more than 5 tr of significance for both the H ^ bb and 
Z ^ bb signals. 


V. CONCLUSION 

In this paper we study the identification of a bottom 
quark-antiquark pair inside in a single jet with high trans¬ 
verse momentum by using the jet substructure in the 
center-of-rest frame of the jet. We demonstrate that the 
method can significantly reduce the QCD jet background 
while maintaining a high identification efficiency of the 
boosted Higgs boson decaying to a bh pair even under 
a very large pileup condition. The study shows a good 
prospective on searches for H ^ bh decay in the VH pro¬ 
duction mode for the LHC experiments at the 14 TeV 
center-of-mass ener^, and it is complementary to the 
existing searches [1,13 which each of the b quarks de¬ 
cayed from the Higgs boson is reconstructed as an indi¬ 
vidual jet. The proposed technique can be also used to 


search for new physics phenomena beyond the SM, such 
as possible dark matter candidates produced in associa¬ 
tion with the SM Higgs boson [H] . 
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Appendix A: Jet substructure distribution 






FIG. 7: The distribution of the jet shape variables in the 
center-of-mass frame of the jet: (a) thrust, (b) thrust-minor, 
(c) sphericity, (d) aplanarity, (e) i? 2 , and (f) cos 0 for the H 
jet signal and QCD jet background. All the distributions are 
normalized to unity. 
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